Snowflake

What is Snowflake?

Snowflake is a cloud-based data warehousing platform that provides data storage, analysis, and reporting. It is designed to be highly scalable, secure, and highly available, and supports a wide range of data types and sources.

Snowflake’s cost-per-usage model is revolutionary in the cloud database ecosystem and good tools to optimize performance and costs including those associated with cloud platform, storage and K8s dependencies are essential.

Snowflake uses a unique architecture that separates storage and computation, allowing for high levels of performance and scalability. It also provides a SQL-based interface for querying data, making it easy for analysts and data scientists to work with. Key features are:

  • A fully managed SaaS (software as a service) that provides a single platform for data warehousing, data lakes, data engineering, data science, secure sharing and consumption of real-time / shared data.
  • Out-of-the-box separates storage and compute, has on-the-fly scalable compute, data sharing, data cloning, and third-party tools support.
  • Enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings.

A Snowflake database is where an organization’s uploaded structured and semi-structured data sets are held for processing and analysis. Snowflake automatically manages all parts of the data storage process, including organization, structure, metadata, file size, compression, and statistics.


What type of database is Snowflake?

Snowflake is a cloud-based relational database management system (RDBMS) that supports the SQL language. It is a columnar database, meaning it stores data in columns rather than rows, allowing for more efficient data compression and faster query performance. Snowflake also incorporates elements of NoSQL databases, such as flexible data modeling and semi-structured data storage, making it a hybrid of both traditional relational and NoSQL databases.


How does Snowflake differ from traditional databases?

Snowflake is a cloud-based data warehousing platform that offers several key differences from traditional databases. Here are some of the common differences:

Traditional Database Snowflake
Architecture and Scalability Often designed with a fixed architecture where hardware resources (CPU, memory, storage) need to be provisioned and managed. Scaling up or down can be complex and may involve downtime. Modern architecture where compute resources and storage are separate. This separation allows for automatic and elastic scaling of compute resources, meaning you can allocate more or fewer resources as needed without impacting the underlying data.
Cloud-Native Approach Typically hosted on-premises or on dedicated servers, which require significant maintenance, administration, and hardware management. Snowflake is a cloud-native platform, meaning it is designed to run on cloud infrastructure. It abstracts away much of the infrastructure management, allowing users to focus on data and analytics rather than hardware and maintenance.
Data Sharing and Collaboration Sharing data across organizations or with external partners can be complex and might involve exporting and importing data, raising security concerns. Snowflake provides built-in data sharing capabilities that enable organizations to securely share data between different Snowflake accounts without the need to copy or move data. This is particularly beneficial for collaborations and data monetization.
Concurrency and Isolation Sometimes struggle with handling multiple concurrent queries or users, leading to performance bottlenecks. Designed to handle high levels of concurrency with its multi-cluster architecture. Each query is executed in its own virtual warehouse, ensuring isolation and minimizing contention for resources.
Data Storage Usually use a row-based storage model, which might not be optimized for analytical workloads. Uses a columnar storage model that's well-suited for analytical queries. This improves query performance and reduces storage requirements for large datasets.
Cost / Billing Model Often require significant upfront investment in hardware and ongoing maintenance costs. Operates on a pay-as-you-go pricing model, where you only pay for the resources you use. This can be more cost-effective, especially for organizations with varying workloads. This consumption-based model can work very well for subscription / PAYG apps and services.

What is the Snowflake architecture?

Snowflake is designed through three main components:

  • Cloud services: Snowflake uses ANSI SQL for cloud services empowering users to optimize their data. It eliminates the need for manual data warehouse management and tuning.
  • Query processing (compute): The compute layer of Snowflake is made up of virtual cloud data warehouses that let you analyze data. Each Snowflake virtual warehouse is an independent cluster and they do not compete for computing resources nor affect the performance of each other — which means workload concurrency is never a problem.
  • Database storage: The database storage layer holds all data loaded into Snowflake, including structured and semi structured data. Snowflake automatically manages all parts of the data storage process, including organization, structure, metadata, file size, compression, and statistics.

See: Key Concepts & Architecture — Snowflake Documentation for more detail.


On which clouds is Snowflake supported? Is Snowflake available on AWS, Azure or GCP?

Snowflake is provided as a self-managed service that runs completely on cloud infrastructure. This means that all three layers of Snowflake’s architecture (storage, compute, and cloud services) are deployed and managed entirely on a selected cloud platform.

A Snowflake account can be hosted on any of the following cloud platforms:

  • Amazon Web Services (AWS)
  • Google Cloud Platform (GCP)
  • Microsoft Azure (Azure)

What are some real-world use cases and success stories of Snowflake implementation?

Snowflake maintain a good customer case study site which gives a good overview of use cases and also the types of industry and verticals Snowflake is leveraged in, see: Snowflake Customer Stories | Snowflake Data Cloud.

Useful overviews are also widely available, including:

Enterprise finance, healthcare and retail operations are particularly strong markets for Snowflake.


What are the top monitoring and dashboard tools for Snowflake?

Popular tools used to monitor and troubleshoot Snowflake include: Datadog, eG Enterprise, NewRelic, Microsoft PowerBI, Tableau, Talend, Qlik and Sigma Analytics.

Because of the cloud native and consumption-based billing approach of Snowflake, many users opt for a tool that can also provide a single console view of not only Snowflake but the underlying cloud and dependencies including cloud billing.

To learn about eG Enterprise support for Snowflake and Snowpipe monitoring, please see: Snowflake Monitoring and Performance Management (eginnovations.com).